专利摘要:
a video decoder is configured by a bidirectional inter-predictive block, using a first mv, to locate a first predictive block in a first reference image; locate a second predictive block in a second reference image using a second mv; determining a first amount of bidirectional (bio) optical flow movement for a first sub-block of the first predictive block; determining a first final predictive sub-block for the video data block based on the first amount of bio motion; determining a second amount of bio motion for a second sub-block of the first predictive block; determining a second final predictive sub-block for the video data block based on the second amount of bio motion; and based on the first final predictive sub-block and the second final predictive sub-block, determine a predictive block.
公开号:BR112019013684A2
申请号:R112019013684
申请日:2018-01-04
公开日:2020-01-28
发明作者:Chuang Hsiao-Chiang;Chen Jianle;Zhang Li;Karczewicz Marta;Chien Wei-Jung;Li Xiang;Chen Yi-Wen
申请人:Qualcomm Inc;
IPC主号:
专利说明:

MOTION VECTOR RECONSTRUCTIONS FOR BIDIRECTIONAL OPTICAL FLOW (BIO) [0001] This application claims the benefit of: U.S. Provisional Patent Application
No. 62 / 442,357, filed on January 4, 2017; and U.S. Provisional Patent Application No. 62 / 445,152, filed on January 11, 2017, the entire content of both of which is incorporated by reference.
TECHNICAL FIELD [0002] This disclosure refers to video encoding.
BACKGROUND [0003] Digital video capabilities can be incorporated into a wide range of equipment, including digital televisions, direct digital broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers , e-book readers, digital cameras, digital recording equipment, digital media players, video game equipment, video game consoles, cell phones or satellite radio, so-called smart phones, video teleconferencing equipment, streaming video equipment and the like. Digital video equipment implements video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10, Encoding Advanced Video (AVC), ITU-T H.265 High Efficiency Video Coding (HEVC) and
Petition 870190061414, of 07/01/2019, p. 8/124
2/78 such patterns. Video equipment can transmit, receive, encode, decode and / or store digital video information more effectively by implementing such video encoding techniques.
[0004] Video encoding techniques include spatial prediction (intra-image) and / or temporal prediction (inter-image) to reduce or remove the redundancy inherent in video sequences. For block-based video encoding, a video slice (that is, a video frame or part of a video frame) can be partitioned into video blocks, and can also be referred to as a tree block, encoding units (CUs) and / or coding nodes. The video blocks in an intra-coded slice (I) of an image are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. Video blocks in an inter-encoded slice (P or B) of an image can use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. Images can be referred to as frames and reference images can be referred to as reference frames.
[0005] The spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be encoded and the predictive block. An intercoded block is encoded according to a motion vector that indicates a block of reference samples that form the predictive block, and the residual data that
Petition 870190061414, of 07/01/2019, p. 9/124
3/78 indicate the difference between the coded block and the predictive block. An intra-coded block is coded according to an intra-coding mode and the residual data. For further compression, residual data can be transformed from the pixel domain to the transform domain, resulting in residual coefficients, which can then be quantified. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned in order to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied in order to obtain even greater compaction.
SUMMARY [0006] In general, this disclosure describes techniques related to bidirectional optical flow (BIO) in video encoding. The techniques of this disclosure can be used in conjunction with existing video codecs, such as High Efficiency Video Encoding (HEVC), or be an efficient encoding tool for future video encoding standards.
[0007] According to an example of this disclosure, a method of decoding video data includes determining a block of encoded video data using an inter bidirectional prediction mode; determine a first movement vector (MV) for the block, in which the first MV points to a first reference image; determine a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image; use the
Petition 870190061414, of 07/01/2019, p. 12/104
4/78 first MV, which locates a first predictive block in the first reference image; use the second MV, which locates a second predictive block in the second reference image; determining a first amount of bidirectional optical flow (BIO) movement for a first subblock of the first predictive block; determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO motion; determining a second amount of BIO movement for a second subblock of the first predictive block; determining a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion; determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and producing an image of video data comprising a decoded version of the video data block.
[0008] According to another example of this disclosure, an apparatus for decoding video data includes a memory configured to store video data; and one or more processors configured to determine a video data block are encoded using an inter bidirectional prediction mode; determine a first movement vector (MV) for the block, in which the first MV points to a first reference image; determine a second MV for the
Petition 870190061414, of 07/01/2019, p. 12/114
5/78 block, in which the second MV points to a second reference image, the first reference image being different from the second reference image; locate a first predictive block in the first reference image using the first MV; locate a second predictive block in the second reference image using the second MV; determining a first amount of bidirectional optical flow (BIO) movement for a first subblock of the first predictive block; determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO motion; determining a second amount of BIO movement for a second subblock of the first predictive block; determining a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion; determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and producing an image of video data comprising a decoded version of the video data block.
[0009] According to another example of this disclosure, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause one or more processors to determine that a block of video data is encoded using an inter prediction mode
Petition 870190061414, of 07/01/2019, p. 12/124
6/78 bidirectional; determine a first movement vector (MV) for the block, in which the first MV points to a first reference image; determine a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image; locate a first predictive block in the first reference image using the first MV; locate a second predictive block in the second reference image using the second MV; determining a first amount of bidirectional optical flow (BIO) movement for a first subblock of the first predictive block; determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO motion; determining a second amount of BIO movement for a second subblock of the first predictive block; determining a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion; determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and producing an image of video data comprising a decoded version of the video data block.
[0010] According to another example of this disclosure, an equipment for decoding video data includes a device for determining a block of video.
Petition 870190061414, of 07/01/2019, p. 12/13
Ί / 78 video data that is encoded using an inter bidirectional prediction mode; a device for determining a first motion vector (MV) for the block, in which the first MV points to a first reference image; a device for determining a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image; a device for locating a first predictive block in the first reference image using the first MV; a device for locating a second predictive block in the second reference image using the second MV; a device for determining a first amount of bidirectional optical flow (BIO) movement for a first sub-block of the first predictive block; a device for determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO motion; a device for determining a second amount of BIO motion for a second sub-block of the first predictive block; a device for determining a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion; a device for determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and a device for producing an image of video data comprising a decoded version of the
Petition 870190061414, of 07/01/2019, p. 12/144
8/78 block of video data.
[0011] The details of one or more aspect of techniques are presented we attached drawings and at description that follows. Others resources, objects and benefits of these techniques will be evident to leave gives
description and drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS [0012] Figure 1 is a block diagram that illustrates an exemplary video encoding and decoding system that can use techniques for bidirectional optical flow.
[0013] Figure 2 is a conceptual diagram illustrating an example of unilateral motion estimation (ME), such as a block matching algorithm (BMA) performed for upward conversion of motion-compensated frame rate (MC-FRUC) .
[0014] Figure 3 is a conceptual diagram that
illustrates an example of ME bilateral, such as a BMA executed for MC- -FRUC. [0015] The figure 4A shows candidates neighbors space MV for the way of : merge. [0016] The figure 4B shows candidates neighbors
MV space for AMVP modes. [0017] THE Figure 5A shows a example of one TMVP candidate.[0018] THE Figure 5B shows An example in MV staggering.[0019] THE Figure 6 shows An example in optical flow path. [0020] THE Figure 7 shows a example of BIO
Petition 870190061414, of 07/01/2019, p. 12/154
9/78 for an 8x4 block.
[0021] Figure 8 shows an example of a modified BIO for an 8x4 block.
[0022] Figures 9A and 9B show exemplary illustrations of sub-blocks where the OBMC applies.
[0023] Figures 10A-10D show examples of OBMC weightings.
[0024] Figure 11 shows an example for the proposed BIO for an 8x4 block.
[0025] Figures 12A-12D show examples of the simplified BIO proposed in the OBMC.
[0026] Figure 13 shows an example of a weighting function for a 4x4 sub-block with a 5x5 window.
[0027] Figure 14 is a block diagram that illustrates an example of a video encoder.
[0028] Figure 15 is a block diagram that illustrates an example of a video decoder that can implement techniques for bidirectional optical flow.
[0029] Figure 16 is a flow chart that illustrates an example of operation of a video decoder, according to a technique of this disclosure.
DETAILED DESCRIPTION [0030] In general, the techniques of this disclosure are related to improvements in bidirectional optical flow (BIO) video encoding techniques. The BIO can be applied during motion compensation. As originally proposed, BIO is used to modify predictive sample values for bi-predictive intercoded blocks based on an optical flow path in order to determine better blocks
Petition 870190061414, of 07/01/2019, p. 12/164
10/78 predictive, such as predictive blocks closer to an original block of video data. The various techniques of this disclosure can be applied, alone or in any combination, to determine when
what if the BIO must be run when if predicts blocks in Dice video, for example, during the compensation in movement. [0031] As used in this revelation, O term video encoding refers to generally The
video encoding or video decoding. Likewise, the term video encoder can generically refer to a video encoder or a video decoder. In addition, certain techniques described in this disclosure regarding video decoding may also apply to video encoding, and vice versa. For example, video encoders and video decoders are often configured to perform the same or reciprocal processes. In addition, video encoders typically perform video decoding as part of the process for determining how to encode video data. Therefore, unless explicitly stated otherwise, it should not be assumed that a technique described in relation to video decoding cannot also be performed by a video encoder, or vice versa.
[0032] This disclosure can also use terms like current layer, current block, current image, current slice, etc. In the context of this disclosure, the current term is intended to identify a block, figure, slice, etc., being currently encoded, unlike, for example,
Petition 870190061414, of 07/01/2019, p. 12/174
11/78 blocks, images and slices previously or already coded or blocks, images and slices already coded.
[0033] In general, an image is divided into blocks, each of which can be encoded predictively. A video encoder can predict a current block using prediction techniques (using data from the image that includes the current block), prediction techniques (using data from a previously encoded image in relation to the image that includes the current block), or other techniques , such as intra-block copy, palette mode, dictionary mode, etc. Intra-prediction includes both unidirectional and bidirectional prediction.
[0034] For each inter-predicted block, a video encoder can determine a set of movement information. The motion information set can contain motion information for forward and backward prediction directions. Here, the forward and backward prediction directions are two prediction directions in a bidirectional prediction mode. The terms forward and backward have no geometric meaning. Instead, the terms generally correspond to whether the reference images should be displayed before (backward) or after (forward) the current image. In some examples, the forward and backward prediction directions may correspond to the reference image list 0 (RefPicList0) and the reference image list 1 (RefPicList1) of a current image. When only a list of reference images is available for an image or slice, only the RefPicListO is
Petition 870190061414, of 07/01/2019, p. 12/184
12/78 available and the movement information of each block of a slice always refers to an image of the RefPicListO (as, for example, it is forward).
[0035] In some cases, a motion vector in conjunction with a corresponding reference index can be used in a decoding process. Such a vector of movement with and associated with the reference index is denoted as a set of unipredictive movement information.
[0036] For each prediction direction, the movement information contains a reference index and a movement vector. In some cases, for simplicity, a motion vector itself can be referred to in a way that the motion vector is assumed to have an associated reference index. A reference index can be used to identify a reference image in the current reference image list (RefPicListO or RefPicListl). A motion vector has a horizontal (x) and a vertical (y) component. In general, the horizontal component indicates a horizontal shift within a reference image, relative to the position of a current block in a current image, needed to locate an x coordinate of a reference block, while the vertical component indicates a vertical offset within of the reference image, relative to the position of the current block, needed to locate a y coordinate of the reference block.
[0037] Image order counting (POC) values are widely used in video encoding standards to identify a display order
Petition 870190061414, of 07/01/2019, p. 12/194
13/78 of an image. Although there are cases in which two images within an encoded video sequence can have the same POC value, this does not normally happen within an encoded video sequence. Thus, the POC values of the images are generally unique and, therefore, can uniquely identify the corresponding images. When multiple encoded video streams are present in a bit stream, images that have the same POC value may be closer to each other in terms of the decoding order. The POC values of the images are typically used for building the reference image list, deriving sets of reference images, such as in HEVC, and motion vector scaling.
[0038] E. Alshina, A. Alshina, J.-H. K. Choi, A. Saxena, M. Budagavi, Performance investigation of known tools for next generation video coding, ITU - Telecommunications Standardization Sector, STUDY GROUP 16, Question 6, Video Coding Experts Group ( VCEG), VCEG-AZO5, June 2015, Warsaw, Poland (hereinafter, Alshina 1), and A. Alshina, E. Alshina, T. Lee, Bidirectional optical flow to improve motion compensation, Video Coding Symposium ( PCS), Nagoya, Japan, 2010 (hereinafter, Alshina 2) described a method called bidirectional optical flow (ΒΙΟ). BIO is based on the pixel level optical flow. According to Alshina 1 and Alshina 2, BIO is applied only to blocks that have forward and backward prediction. The BIO, as described in Alshina 1 and Alshina 2, is summarized below:
Petition 870190061414, of 07/01/2019, p. 12/20
14/78 [0039] Given a pixel value I t at time t, the first order Taylor expansion of the pixel value is “4th + (í“ ίθ) (A) [0040] OI t o is on the trajectory of motion of I t . That is, the movement from I t o to I t is considered in the formula.
[0041] Under the assumption of optical flow:
dl dl dl dx dl dy θ dt dt dx dt õy dt dl dl dx dl dy dt dx dt dy dt c .... d> c _ &
3x ' {y dv let y (gradient) and equation (A) become = 4o - Gxv · “· (í - 41) - úyü · (t - 4) (B) dx dy [0042] About e , depending on the speed of movement, V x o and V y o can be used to represent them.
[0043] Then, equation (B) becomes “4ü“ ~~ 4) “ÚyO * kyü '(t tfj) (C) [0044] Suppose a forward reference in to and a backward reference in you, and that t 0 - f ”t - ti” ΔΕ ~ 1 [0045] So:
Petition 870190061414, of 07/01/2019, p. 12/21
15/78 - 4θ & χθ 'ί'χο' (t Íq) Cyo '' ^ o) - 4d T 4y0 '^ xO Ί 4yo * ^ yO ”41 d x j · V x x' (t t-l ') ÓyJ · l / yx '(t tl) - 4] 6χχ' P X X 6 V: t '1 / yX r ítUÃt, (^ x (»'^«> ~ ^ xi '^ xi) + (^ yo' ^ yo “^ yi '^ yi) = —7“ + --------— ---'— (D) [0046]
V x0 = V xl = V x e is also assumed
VyO = Vj as long as the movement is along the path.
Then, equation (D) would become (E) where AGG x i, H Gy
GyO
G yl can be calculated based on reconstructed references. Once
4θ + 41 that 2 is the bi-prediction called, henceforth, convenience.
[0047] V x and Vy encoder and decoding distortion:
~ min
Δ Gv VQ ί Δ Gy 'Py j regular, 2 is BIO displacement because they are derived both in the sink, minimizing the following
f) “(41“ 4-1 '14 ”' 14)) |
+ θχΐ) * Κτ + (GyO [0048] With V x and V y derivatives, the final prediction of the block is calculated with equation (E). V x and V y are called BIO movement for convenience.
[0049] In general, a video encoder performs BIO during motion compensation. That is, after the video encoder determines a motion vector for a current block, the video encoder
Petition 870190061414, of 07/01/2019, p. 12/22
16/78 produces a predictive block for the current block using motion compensation in relation to the motion vector. In general, the motion vector identifies the location of a reference block in relation to the current block in a reference image. When executing BIO, a video encoder modifies the motion vector on a porpixel basis for the current block. That is, instead of retrieving each pixel of the reference block as a block unit, according to ΒΙΟ, the video encoder determines per-pixel changes in the motion vector for the current block and constructs the reference block in such a way that the reference block includes reference pixels identified by the motion vector and the modification by pixel for the corresponding pixel in the current block. Thus, the BIO can be used to produce a more accurate reference block for the current block.
[0050] Figure 1 is a block diagram illustrating an example of a video encoding and decoding system 10 that can use techniques for bidirectional optical flow. As shown in Figure 1, system 10 includes a source device 12 that generates encoded video data to be decoded later by a target device 14. In particular, source device 12 sends the video data to the target device 14 by means of a computer-readable medium 16. The source device 12 and the target device 14 can comprise any of a wide range of devices, including desktop computers, notebook computers (such as a laptop) tablet computers, set-top box converters, telephone sets, such as
Petition 870190061414, of 07/01/2019, p. 12/23
17/78 so-called smart phones, so-called smart devices, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device 12 and the destination device 14 can be equipped for wireless communication.
[0051] The destination device 14 can receive the encoded video data to be decoded by means of the computer-readable medium 16. The computer-readable medium 16 can comprise any type of medium or apparatus capable of moving the data encoded video files from the source device 12 to the target device 14. In one example, the computer-readable medium 16 may comprise a communication means to allow the source device 12 to transmit encoded video data directly to the device destination 14 in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device 14. The communication medium can comprise any wireless or wired communication medium, such as such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can be part of a network based and packets, such as a local area network, an extended area network or a global network, such as the Internet. The communication medium may include routers, switches, base stations or other equipment that may be useful to facilitate the communication of the source device 12 with the
Petition 870190061414, of 07/01/2019, p. 12/24
18/78 destination 14.
[0052] In some examples, encrypted data can be transmitted from the output interface 22 to a storage device. In the same way, encrypted data can be accessed from the storage device via the input interface. The storage device may include any of several means of storing data distributed or accessed locally, such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory or any other means of digital storage to store encoded video data. In another example, the storage device may correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device 12. The destination device 14 can access stored video data from the storage device via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device 14. Exemplary file servers include a web server (for a website, for example) , an FTP server, network attached storage devices (NAS) or a local hard drive. The target device 14 can access the encoded video data via any standard data connection, including an Internet connection. This can include a wireless channel (a Wi-Fi connection, for example), a wired connection (such as, for example, DSL, cable modem, etc.), or a combination of both that
Petition 870190061414, of 07/01/2019, p. 12/25
19/78 is suitable for accessing encoded video data stored on a file server. The encoded video data transmission from the storage device can be a streaming transmission, a download transmission or a combination thereof.
[0053] The techniques described in this disclosure can be applied to video encoding in support of various multimedia applications, such as broadcast television broadcasts, cable television broadcasts, satellite television broadcasts, video streaming streams from Internet such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium or other applications. In some instances, system 10 can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video replay, video broadcasting and / or video telephony.
[0054] In the example of Figure 1, the source device 12 includes a video source 18, a video encoder 20 and an output interface 22. The destination device 14 includes an input interface 28, a video decoder 30 and a display device 32. According to this disclosure, the video encoder 20 of the source device 12 can be configured to apply bidirectional optical flow techniques. In other examples, a source device and a target device may include other components or arrangements. For example, the
Petition 870190061414, of 07/01/2019, p. 12/26
Source 20/78 can receive video data from an external video source 18, such as an external camera. In the same way, the target device 14 can interface with an external display device, instead of including an integrated display device.
[0055] System 10 illustrated in Figure 1 is merely an example. Techniques for bidirectional optical flow can be performed by any digital video encoding and / or decoding device. While the techniques of this disclosure are generally performed by a video encoding apparatus, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. Furthermore, the techniques of this development can be performed by a video preprocessor. The source device 12 and the target device 14 are merely examples of encoding devices in which the source device 12 generates decoded video data for transmission to the target device 14. In some examples, the devices 12, 14 may function substantially symmetrical so that each of the apparatus 12, 14 includes video encoding and decoding components. Consequently, system 10 can support individual or bidirectional video transmission between video devices 12, 14, such as, for example, video streaming, video replay, video broadcasting, or video telephony.
[0056] The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video file containing previously captured videos, and / or a video interface.
Petition 870190061414, of 07/01/2019, p. 12/27
21/78 video feed to receive video from a video content provider. As another alternative, video source 18 can generate data based on computer graphics as the video source or a combination of live videos, archived videos or computer generated videos. In some cases, if the video source 18 is a video camera, the source device 12 and the destination device 14 can form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure can be applied to video encoding in general and can be applied to wireless and / or wired applications. In each case, the video, pre-captured or generated by the computer can be encoded by the video encoder 20. The encoded video information can be transmitted through the output interface 22 to a computer-readable medium 16.
[0057] Computer-readable medium 16 may include transient media, such as a wireless broadcast or wired network transmission (or non-transient storage media, for example) such as a hard disk, a flash drive, a disk compact disc, a digital video disc, a Blu-ray disc or media that can be read by a computer. In some examples, the network server (not shown) can receive encoded video data from the source device 12 and send the encoded video data to the destination device 14, via network transmission, for example. In the same way, a computing device in a media production facility, such as a disk stamping facility, can receive encoded video data from the source device 12
Petition 870190061414, of 07/01/2019, p. 12/28
22/78 and produce a disc containing the encoded video data. Therefore, the computer-readable medium 16 can be understood to include one or more computer-readable media in various ways, in several examples.
[0058] The input interface 28 of the destination device 14 receives information from the computer-readable medium 16. The information from the computer-readable medium 16 can include syntax information defined by the video encoder 20, which can be used by the video decoder 30, which includes elements of syntax that describe characteristics and / or processing of the video data. The display device 32 displays the decoded video data to the user and can include any of several display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display or a organic light-emitting diode (OLED) screen or other type of display device.
[0059] Video encoder 20 and video decoder 30 can operate according to one or more video encoding standards, such as ITU-T H.264 / AVC (Advanced Video Encoding) or Video Encoding of High Efficiency (HEVC), also referred to as ITU-T H.265. Ο H.264 is described in the International Telecommunication Union, Advanced video coding for generic audiovisual services, SERIES H: MULTIMEDIA AND AUDIOVISUAL SYSTEMS, Infrastructure for audiovisual services - Encoding of videos in motion, H.264, June 2011 .2 H.265 is described in the International Union of
Petition 870190061414, of 07/01/2019, p. 12/29
23/78
Telecommunications, High Efficiency Video Coding, SERIES H: MULTIMEDIA AND AUDIOVISUAL SYSTEMS, Infrastructure for audiovisual services - Video encoding in motion, April 2015. The techniques of this disclosure can also be applied to any other previous video coding or standards to come as an efficient coding tool.
[0060] Other video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG-4 Visual and the H.264 Scalable Video Encoding (SVC) and Multi-Video Encoding (MVC) extensions, as well as HEVC extensions, such as the range extension, multi-view extensions (MV- HEVC) and Scalable Extension (SHVC). On April 20, 2005, the Video Coding Experts Group (VCEG) started a new research project aimed at a new generation of video coding standards. The reference software is called HM-KTA.
[0061] VCEG ITU-T (Q6 / 16) and MPEG ISO / IEC (JTC 1 / SC 29 / WG 11) are now studying the potential need for standardization of upcoming video encoding technology with a compression capability that significantly exceeds that of the current HEVC standard (which includes current HEVC extensions and short-term extensions for screen content encoding and high dynamic range encoding). The groups are working together on this exploration activity in a joint collaborative effort known as the Joint Video Exploration Team (JVET) to evaluate technology projects.
Petition 870190061414, of 07/01/2019, p. 12/30
24/78 compression proposed by its specialists in this area. The JVET met for the first time between 19 and 21 October 2015. The latest version of the reference software, ie, Joint Exploration Model 3 (JEM3) can be downloaded at: https: //jvet.hhi .fraunhofer.de / svn / svn HMJEMSoftware / tags / HM-16.6-JEM-4.0 / A description of the Joint Exploration Test Model 3 (JEM3) algorithm could be referred to JVET-D1001.
[0062] Certain video encoding techniques, such as those of H.264 and HEVC that are related to the techniques of this disclosure, are described in this disclosure. Certain techniques in this disclosure can be described with reference to H.264 and / or HEVC to aid understanding, but the techniques described are not necessarily limited to H.264 or HEVC and can be used in conjunction with other coding standards and others coding tools.
[0063] Although not shown in Figure 1, in some ways, video encoder 20 and video decoder 30 can be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to process audio and video encoding in a common data stream or separate data stream. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or other protocols, such as the user datagram protocol (UDP).
[00 64] In HEVC and other video encoding standards, a video sequence includes
Petition 870190061414, of 07/01/2019, p. 12/31
25/78 typically a series of images. Images can also be referred to as frames. An image can include three sample arrangements, denoted as Si, S cb and S Cr . Si is a two-dimensional array (ie, a block) of luma samples. S cb is a two-dimensional array of CB chrominance samples. S Cr is a two-dimensional array of Cr chrominance samples. Chrominance samples can also be referred to here as chroma samples. In other cases, an image may be monochromatic and may include only an array of luma samples.
[0065] To generate a coded representation of an image, the video encoder 20 can generate a set of encoding tree units (CTUs). Each of the CTUs can be a luma sample coding tree block, two corresponding chroma sample coding tree blocks and syntax structures used to encode the samples of the coding tree blocks. In monochrome images or images that have three separate color planes, a CTU can comprise a single coding block and syntax structures used to encode the samples in the coding block. A coding tree block can be an NxN block of samples. A CTU can also be referred to as a tree block, a larger coding unit (LCU). HEVC CTUs can be largely analogous to macroblocks from other video encoding standards, such as H.264 / AVC. However, a CTU is not necessarily limited to a specific size and may include one or more encoding units (CUs). A slice can include an entire number of ordered CTUs
Petition 870190061414, of 07/01/2019, p. 12/32
26/78 consecutively in the scan by scan.
[0066] A CTB contains a quad-tree whose nodes are coding units. The size of a CTB can vary from 16x16 to 64x64 in the main HEVC profile (although technically CTB sizes 8x8 can be supported). A coding unit (CU) could be the same size as a CTB, although it is as small as 8x8. Each coding unit is coded with a mode. When a CU is inter-encoded, the CU can be partitioned into 2 or 4 prediction units (PUs) or become just a PU when another partition does not apply. When two PUs are present in a CU, the two PUs can be half size rectangles or two rectangle sizes with h or 3/4 size CU.
[0067] To generate an encoded CTU, video encoder 20 can recursively perform quad-tree partitioning on the encoding tree blocks of a CTU in order to divide the encoding tree blocks into encoding blocks, hence the name of coding tree units. A coding block is an NxN block of samples. A CU can comprise a luma sample coding block and two corresponding chroma sample coding blocks of an image that has a luma sample arrangement, a Cb sample arrangement with a Cr sample arrangement and syntax structures used to encode the samples of the coding blocks. In monochrome images or images that have three separate color planes, a CU can comprise a single coding block and syntax structures used to encode the samples in the
Petition 870190061414, of 07/01/2019, p. 12/33
27/78 coding.
[0068] The video encoder 20 can partition an encoding block of a CU into one or more prediction blocks. A prediction block is a rectangular block (that is, square or non-square) of samples to which the same prediction is applied. A CU prediction unit (PU) may comprise a luma sample prediction block, two corresponding chroma sample prediction blocks and syntax structures used to predict the prediction block samples. In monochrome images or images that have three separate color planes, a PU can comprise a single prediction block and syntax structures used to predict the samples in the prediction block. The video encoder 20 can generate predictive luma, Cb and Cr blocks for the luma, Cb and Cr prediction blocks of each PU of the CU.
[0069] The video encoder 20 can use intra-prediction or inter-prediction to generate the predictive blocks for a PU. If the video encoder 20 uses intra-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of the image associated with the PU. If the video encoder 20 uses inter-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of one or more images other than the image associated with the PU. When the CU is coded, a set of movement information can be present for each PU. In addition, each PU can be encoded with a unique mode of
Petition 870190061414, of 07/01/2019, p. 12/34
28/78 prediction to derive the set of movement information.
[0070] After the video encoder 20 generates a predictive luma, Cb and Cr blocks for one or more PUs of a CU, the video encoder 20 can generate a residual luma block for the CU. Each sample in the CU luma residual block indicates the difference between a luma sample in one of CU's predictive luma blocks and a corresponding sample in the original CU luma coding block. In addition, the video encoder 20 can generate a residual block Cb for the CU. Each sample in the CU residual block Cb can indicate the difference between a Cb sample in one of CU's predictive Cb blocks and a corresponding sample in the original CU Cb coding block. The video encoder 20 can also generate a residual block Cr for the CU. Each sample in the residual CU block of the CU can indicate the difference between a Cr sample in one of the predictive Cr blocks of the CU and a corresponding sample in the original CU coding block.
[0071] In addition, video encoder 20 can use quad-tree partitioning to decompose the luma, residual blocks Cb and Cr of a CU into one or more luma, transformed blocks Cb and Cr. A transform block can be a rectangular block of samples to which the same transform is applied. A transform unit (TU) of a CU can comprise a luma sample transform block, two corresponding chroma sample transform blocks and syntax structures used to transform the transform block samples. Thus, each CU of a CU can be associated
Petition 870190061414, of 07/01/2019, p. 12/35
29/78 to a block of transforms luma, a block of transforms Cb and a block of transforms Cr. The luma-transformed block associated with the TU may be a sub-block of the residual luma block of the CU. The transform block Cb can be a subblock of the residual block Cb of CU. The transform block Cr can be a sub-block of the residual block Cr of CU. In monochrome images that have three separate colored planes, a TU can comprise a single transform block and syntax structures used to transform the samples in the transform block.
[0072] The video encoder 20 can apply one or more transforms to a luma transform block of a TU in order to generate a luma coefficient block for the TU. A coefficient block can be a two-dimensional array of transform coefficients. A transform coefficient can be a scalar quantity. The video encoder 20 can apply one or more
transformed to a block in transformed Cb in an YOU in so as to generate one block in Cb coefficients for The YOU. . 0 encoder in video 20 can apply an or more transformed to a block in transformed Cr in an YOU in to generate · a block of strain Cr ficient for the TL J. [0073] After to generate a blockin coefficients (one block of luma coefficients. , one block in coefficients Cb or one coefficient block Cr,per
example), video encoder 20 can quantify the coefficient block. Quantification generally refers to a process in which transform coefficients are quantified to possibly reduce the amount of data used to represent the coefficients of
Petition 870190061414, of 07/01/2019, p. 12/36
30/78 transformed, obtaining additional compaction. After the video encoder 20 quantifies a block of coefficients, the video encoder 20 can entropy encode syntax elements that indicate the quantized transform coefficients. For example, video encoder 20 can perform Adaptive Context Binary Arithmetic Coding (CABAC) on the syntax elements that indicate the quantized transform coefficients.
[0074] The video encoder 20 can transmit a bit stream that includes a bit sequence that forms a representation of encoded images and related data. The bit stream may comprise a sequence of NAL units. An NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes that contain that data in the form of an RBSP interspersed as needed with emulation prevention bits. Each of the NAL units includes an NAL unit header and encapsulates an RBSP. The NAL unit header can include a syntax element that indicates an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. A BRSP can be a syntax structure that contains an integer number of bytes that is encapsulated within an NAL unit. In some cases, an RBSP includes zero bit.
[0075] Different types of NAL units can encapsulate different types of RBSP. For example, a first type of NAL unit can encapsulate an RBSP to a PPS, a second type of NAL unit can encapsulate an RBSP to an encoded slice, a third type of unit
Petition 870190061414, of 07/01/2019, p. 37/124
31/78
NAL can encapsulate an RBSP for SEI messages and so on. NAL units that encapsulate RBSPs for video encoding data (as opposed to RBSPs for parameter sets and SEI messages) can be referred to as NAL VCL units.
[0076] The video decoder 30 can receive a bit stream generated by the video encoder 20. Furthermore, the video decoder 30 can parse the bit stream in order to obtain syntax elements of the bit stream. The video decoder 30 can reconstruct the images of the video data based, at least in part, on the syntax elements obtained from the bit stream. The process for reconstructing the video data can generally be complementary to the process performed by the video encoder 20. In addition, the video decoder can quantify blocks of coefficients associated with TUs of the current CU by inversion. The video decoder 30 can perform inverse transforms on the coefficient blocks in order to reconstruct transformer blocks associated with the TUs of the current CU. The video decoder 30 can reconstruct the encoding blocks of the current CU by adding the samples of the predictive blocks for PUs of the current CU to corresponding samples of the transform blocks of the current CU's TUs. By reconstructing the encoding blocks for each CU of an image, the video decoder 30 can reconstruct the image.
[0077] According to the techniques of this disclosure, the video encoder 20 and / or the video decoder 30 can still perform BIO techniques during the
Petition 870190061414, of 07/01/2019, p. 12/38
32/78 motion compensation, as discussed in more detail below.
[0078] The video encoder 20 and video decoder 30 can each be implemented as any one of several suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), specific integrated circuits of application (ASICs), field programmable port arrangements (FPGAs), discrete logic, software, hardware, firmware or any combination of them. Each of the video encoder 20 and video decoder 30 can be included in one or more encoders and decoders, both of which can be integrated as part of a combined encoder / decoder (CODEC) in a respective apparatus. An apparatus including the video encoder 20 and the video decoder 30 may comprise an integrated circuit, a microprocessor and / or a wireless communication device, such as a cell phone.
[0079] Figure 2 is a conceptual diagram illustrating an example of unilateral motion estimation (ME), such as a block matching algorithm (BMA) performed for upward conversion of motion-compensated frame rate (MC-FRUC) . In general, a video encoder (such as video encoder 20 or video decoder 30) performs one-sided ME to obtain motion vectors (MVs), such as MV 112, searching for the best matching block (such as the block reference 108) from reference frame 102 to current block 106 of current frame 100. Then, the
Petition 870190061414, of 07/01/2019, p. 12/39
33/78 video interpolates an interpolated block 110 along the motion path of motion vector 112 in interpolated frame 104. That is, in the example in Figure 2, motion vector 112 passes through the midpoints of the current block 106, block reference 108 and interpolated block 110.
[0080] As shown in Figure 2, three blocks in three frames are involved after the movement path. Although the current block 106 in the current frame 100 belongs to an encoded block, the best corresponding block in reference frame 102 (that is, reference block 108) does not need to belong entirely to an encoded block (that is, the best corresponding block may not fall over an encoded block boundary, but may override that boundary instead). Likewise, interpolated block 110 in interpolated frame 104 does not need to belong entirely to an encoded block. Consequently, the overlapping regions of the blocks and unfilled regions (holes) can occur in the interpolated frame 104.
[0081] To handle overlays, simple FRUC algorithms may involve averaging and overwriting the overlapping pixels. In addition, the holes can be covered by the pixel values from a reference or a current frame. However, these algorithms can result in blocking and defocusing artifacts. Consequently, motion field segmentation, successive extrapolation using Hartley's discrete transform and image retouching can be used to manipulate holes and overlays without increasing blocking and defocusing artifacts.
Petition 870190061414, of 07/01/2019, p. 12/40
34/78 [0082] Figure 3 is a conceptual diagram that illustrates an example of bilateral ME, such as a BMA performed for MC-FRUC. Bilateral ME is another solution (at MC-FRUC) that can be used to avoid problems caused by overlaps and holes. A video encoder (such as video encoder 20 and / or video decoder 30), which performs bilateral ME, obtains MVs 132, 134 passing through interpolated block 130 of interpolated frame 124 (which is intermediate to current frame 120 and reference frame 122) using temporal symmetry between current block 126 of current block 120 and reference block 128 of reference frame 122. As a result, the video encoder does not generate overlaps and holes in the interpolated frame 124. Assuming that the current block 12 6 is a block that the video encoder processes in a certain order, for example, as in the case of video encoding, a sequence of such blocks would cover the entire intermediate image without overlapping. For example, in the case of video encoding, the blocks can be processed in the order of decoding. Therefore, such a method may be more appropriate if FRUC's ideas can be considered in a video coding framework.
[0083] S.-F. Tu, OC Au, Y. Wu, E. Luo and CH Yeun, A New Framework for Upward Conversion of Frame Rate by Estimated Optical Flow of Predictive Variable Block Size, International Congress on Image Signal Processing (CISP) , 2009, described a hybrid block level motion estimation and pixel level optical flow method for upward conversion of frame rate. You stated
Petition 870190061414, of 07/01/2019, p. 41/124
35/78 that the hybrid scene was better than any individual method.
[0084] In the HEVC standard, there are two inter prediction modes, called blending mode (with the jump mode considered as a special case of blending mode) and the advanced motion vector prediction mode (AMVP). In any AMVP or blending mode, a video encoder maintains a list of MV candidates for various motion vector predictors. A video encoder determines the motion vector (s) for a specific PU, as well as reference indexes in the merge mode, selecting a candidate from the MV candidate list.
[0085] In HEVC, the MV candidate list contains up to 5 candidates for the blending mode and only two candidates for the AMVP mode. Other coding standards may include more or less candidates. A merge candidate can contain a set of motion information, such as motion vectors corresponding to both the reference image lists (list 0 and list 1) and the reference indices. A video decoder receives a merge candidate identified by a merge index and the video decoder predicts a current PU using the identified reference image (s) and motion vector (s). However, for AMVP mode, for each potential prediction direction of list 0 or list 1, a reference index needs to be explicitly flagged, along with an index of the MV predictor (MVP) for the list of MV candidates, as the AMVP candidate contains only one vector of
Petition 870190061414, of 07/01/2019, p. 42/124
36/78 movement. In AMVP mode, predicted motion vectors can also be refined.
[0086] A candidate for blending corresponds to a complete set of movement information, while a candidate for AMVP contains only one movement vector for a specific prediction direction and reference index. Candidates for both modes are similarly derived from the same neighboring blocks, temporal and spatial.
[0087] Figure 4A shows candidates for spatial neighbor VM for blending mode and Figure 4B shows candidates for spatial neighbor MV for AMVP modes. Spatial MV candidates are derived from the neighboring blocks shown in Figures 4A and 4B, for a specific PU (PUo), although the methods that generate the block candidates are different for AMVP and blending modes.
[0088] In blending mode, up to four candidates for spatial MV can be derived with the orders shown in Figure 4A. The ordering is as follows: left (0), above (1), above right (2), below left (3) and above left (4), as shown in Figure 4A. If all candidates for spatial MV 0-3 are available and unique, then the video encoder may not include motion information for the block above to the left in the candidate list. If, however, one or more candidates for spatial MV 0-3 are not available or are not exclusive, then the video encoder can include motion information for the block above on the left in the candidate list.
Petition 870190061414, of 07/01/2019, p. 43/124
37/78 [0089] In AVMP mode, neighboring blocks are divided into two groups: left group consisting of blocks 0 and 1, and above group consisting of blocks 2, 3 and 4, as shown in Figure 4B. For each group, the potential candidate in a neighboring block for the same reference image indicated by the flagged reference index has the highest priority to be chosen to form a final candidate for the group. It is possible that all neighboring blocks do not contain a motion vector that points to the same reference image. Therefore, if such a candidate cannot be found, the first available candidate will be staggered to form the final candidate, so differences in temporal distance can be compensated.
[00 90] Figure 5A shows an example of a TMVP candidate and Figure 5B shows an example of MV scheduling. The candidate for temporal motion vector predictor (TMVP), if enabled and available, is added to the MV candidate list after the candidates for spatial motion vector. The motion vector derivation process for the TMVP candidate is the same for the blend and AMVP modes, however, the target benchmark for the TMVP candidate in the blend mode is always set to 0.
[0091] The location of the primary block for TMVP candidate derivation is the lower right block outside the placed PU, as shown in Figure 5A, as well as a T block, to compensate for the tendency to the above and left blocks used to generate neighboring candidates space. However, if that block is located
Petition 870190061414, of 07/01/2019, p. 44/124
38/78 outside the current CTB line or the movement information is not available, the block will be replaced by a central PU block.
[0092] The motion vector for the TMVP candidate is derived from the co-located PU of the placed image, indicated at the slice level. The motion vector for the co-located PU is called the placed MV. Similar to the direct temporal mode in the AVC, to derive the candidate motion vector TMVP, the co-located MV needs to be scaled to compensate for the differences in temporal distance, as shown in Figures 5B.
[0093] HEVC also uses motion vector scaling. It is assumed that the value of the motion vectors is proportional to the distance of the images at the time of presentation. A motion vector associates two images, the reference image and the image containing the motion vector (namely the contained image). When a motion vector is used to predict the other motion vector, the distance between the contained image and the reference image is calculated based on the POC values.
[0094] For a motion vector to be predicted, the image containing the image and the reference image associated with the motion vector may be different. Therefore, a new distance (based on POC) is calculated and the motion vector is scaled based on these two POC distances. For a spatial neighbor candidate, the images that contain the two motion vectors are the same, while the reference images are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for neighboring candidates,
Petition 870190061414, of 07/01/2019, p. 45/124
39/78 temporal and spatial.
[0095] HEVC also uses generation of
candidates vector of artificial movement. If a list of candidates the motion vector is not complete, the candidates the artificial motion vector are generated and inserted at the end of the list until all entries
available in the list of motion vector candidates have one candidate. In blending mode, there are two types of
candidates artificial MV: combined candidate, derivative
only for slices B and candidate zero, used only
for AMVP, if the first type does not provide candidates
enough artificial ones. For each pair of candidates that is already on the candidate list and has the necessary motion information, candidates for bidirectional combined motion vector are derived by a combination of the motion vector of the first candidate that refers to an image in list 0 and the motion vector of a second candidate referring to an image in list 1.
[0096] HEVC also uses a process of
pruning for insertion of candidates. Candidates from
different blocks can happen to be the same, which
decrease the efficiency of a list of candidates for
AMVP / merge. A pruning process can be applied to solve this problem. A pruning process compares one candidate with the others on the current list of candidates to avoid entering an identical candidate. To reduce complexity, only a limited number of pruning processes can be applied instead of comparing each potential
with all existing ones. As an example, a
video encoder can apply a pruning process to
Petition 870190061414, of 07/01/2019, p. 46/124
40/78 neighboring candidates, temporal and spatial, but not artificially generated candidates.
[0097] Aspects of bidirectional optical flow in JEM will now be described. Figure 6 shows an example of an optical flow path. The BIO uses a pixel-mode motion refinement that runs on top of block-mode motion compensation in a bipredict case. As the BIO compensates for the refined movement within the block, activating the BIO can effectively result in the expansion of the block size for motion compensation. Refining movement at the sample level does not require exhaustive search or signaling, but instead uses an explicit equation that provides a refined motion vector for each sample.
[0098] Let I <k> be the luminance value from k (k = 0, 1) after the movement of the compensation block, and they are horizontal and vertical components of the gradient I <k> , respectively. Assuming that the optical flow is valid, the motion vector field (v x , v y ) is given by an equation õl ^ / õt + v x a / * yar + v ,. = 0.
(1) [0099] Combining the optical flow equation with the Hermite interpellation for the movement path of each sample, a unique third order polynomial is obtained that combines both I <k> and derivative function values'> at the ends. The value of this polynomial at t = 0 is the BIO prediction:
Petition 870190061414, of 07/01/2019, p. 47/124
41/78 / ^: ^, - 1/2 - (^ + Z W + ν 5. · - (ηβ / ^ / φ - ^^ Μΐ (2) [0100] Here, το and Ti denote the distance for frames of reference, as shown in Figure 6. The distances, To and Ti are calculated based on the POC for RefO and Ref1:
T 0 = POC (current) - POC (RefO), Ti POC (Refl) POC (current). If both predictions come from the same time direction (either from the past or both from the future), then the signals are different To xt 2 <0. In this case, BIO is applied only if the prediction does not come from the same time point (τ 0 A τ 2 ), both referenced regions have nonzero movement (MVxo, MVyo, MVxi, MVyi A 0) and the block movement vectors are proportional to the time distance (MVx 0 / MVxj. = MVy 0 / MVy! = To / T 2 ).
[0101] The motion vector field (v x , v y ), also referred to as the amount of motion BIO, is determined by minimizing the difference Δ between the values at points A and B (intersection of the motion path and planes of motion). reference image in figure 6). The model uses only the first linear term of the Taylor local expansion for Δ:
Δ = (/ <0} -7 A + v (0 / &'+ 5) (3) [0102] All values in (1) depend on the location of the sample (i', j '), which has been omitted until Assuming that the movement is consistent in the local environment, then Δ within (2M + 1) χ (2M + 1) of the square window Ω centered on the current predicted point (i, j) can be
Petition 870190061414, of 07/01/2019, p. 48/124
42/78 minimized as follows:

[0103] For this optimization problem, the simplified solution that makes the first minimization in the vertical and horizontal directions can be used, which results in:
- (s t 4- r)> m clip3 (—thBIO, thBIO, -: 0 (5) (Sj-í-r) / u v = (% + r)> m clip3 (—thBIO, thBIO, - : 0 y - 1 zr V (sg + r) / (6) where,
K / .k * uh v 2 =] F (r { ÔZ w / âr + τ 0 dl / qy i- r (l s, - y (r, eP / Sy + - £ (/ w - l ^ P / õv ^ r ^ / êy} [••• k ík (7) [0104] To avoid division by zero or too small a value, the smoothing parameters rem are introduced in equations (2), (3).
r = 500 · 4 d 8 (8) m 700 · 4 d8 (9)
Here d is the internal bit depth of the video input.
[0105] In some cases, the refinement of MV of the
Petition 870190061414, of 07/01/2019, p. 49/124
43/78
BIO can be unreliable due to noise or irregular movement. Therefore, in BIO, the magnitude of the MV refinement is cut to the determined limit thBIO. The threshold value is determined based on whether all reference images in the current image are all from one direction. If all reference images of the current images of the current image are from one direction, the limit value can be set to 12 x 2 14 d , otherwise the limit can be set to 12 x 2 1 ' d .
[0106] Gradients for BIO are calculated at the same time with motion compensation interpellation using operations consistent with the HEVC motion compensation process (separable FIR in 2D). The entry for this separable 2D FIR is the same reference image sample as for the fractional position and motion compensation process (fracX, fracY), according to the fractional part of the block's motion vector. In the case of the vertical gradient, the first gradient filter is applied vertically using BIOfilterG, corresponding to the fractional position fracY with the change of de-escalation d-8, then the signal is shifted using BlOfilterS in the horizontal direction corresponding to the fractional position weak with change of de-escalation d-8. The interpellation filter length for calculating BIOfilterG gradients and BIOfilterF signal offset is shorter (6 tap) in order to maintain reasonable complexity. Table 1 shows the filters used to calculate gradients for different fractional positions of the block movement vector in ΒΙΟ. Table 2 shows the filters
Petition 870190061414, of 07/01/2019, p. 50/124
44/78 interpolation used to generate BIO prediction signals.
[0107] Figure 7 shows an example of the gradient calculation for an 8x4 block. For 8x4 blocks, a video encoder looks for compensated motion predictors and calculates the HOR / VER gradients of all pixels within the current block, as well as the two outer lines of pixels, as the resolution of vx and vy for each pixel requires the gradient of HOR / VER values and predictors of pixel compensated movement within the window Ω centered on each pixel, as shown in equation (4). In JEM, the size of this window is set to 5x5. Therefore, a video encoder needs to look for motion compensation predicators and calculate the gradients for the two outer lines of pixels.
Table 1: Filters for calculating BIO gradients
Position by Filter gradient interpolation fractional(BIOfilterG) 0 {8, -39, -3, 46, -17, 5} 1/16 {8, -32, -13, 50, -18, 5} 1/8 {7, -27, -20, 54, -19, 5} 3/16 {6, -21, -29, 57, -18, 5}{4, -17, -36, 60, -15, 4} 5/16 {3, -9, -44, 61, -15, 4} 3/8 { 1, -4, -48, 61, -13, 3} 7/16 {0, 1, -54, 60, -9, 2} ½ {I, 4, -57, 57, -4, 1}
Table 2: Interpolation filters for generating BIO prediction signals
Position by Interpolation filter for signal fractional prediction (BlOfilterS) 0 {0, 0, 64, 0, 0, 0}
Petition 870190061414, of 07/01/2019, p. 51/124
45/78
1/16 { 1, -3, 64, 4, -2, 0} 1/8 { 1, -6, -62, 9, -3, 1} 3/16 { 2, -8, 60, 14, -5, 1}{ 2, -9, -57, 19, -7, 2} 5/16 {3, -io, 53, 24, -8, 2} 3/8 {3, -11, 50, 29, -9, 2} 7/16 {3, -11, 44, 35, -io, 3} ½ { 1, -7, 38, 38, -7, 1}
[0108] In JEM, BIO is applied to all bidirectional predicted blocks when the two predictions are from different reference images. When the LIC is available for a CU, the BIO is unavailable.
[0109] Figure 8 shows an example of a modified BIO for the 8x4 block proposed in JVET-D0042. 4. At the meeting of JVET a proposal for JVET-D0042 (A. Alshina, E. Alshina, AHG6: About the width of BIO memory bandwidth, JVET-D0042, October 2016) was submitted to modify the BIO operations and reduce memory access bandwidth. In this proposal, motion-compensated predictors and gradient values are not required for pixels outside the current block. In addition, the resolution of vx and vy for each pixel is modified to use the motion compensated predictors and the gradient values of all pixels within the current block, as shown in Figure 8. In other words, the square window Ω in equation (4) is modified for a window that is the same as the current block. In addition, a weighting factor w (i ', j') is considered to derive vx and vy. The w (i ', j') is a function of the position of the central pixel (i, j) and the positions of the pixels (I ', j') within the window.
Petition 870190061414, of 07/01/2019, p. 52/124
46/78
(10) [0110] Aspects of Overlapping Block Movement Compensation (OBMC) in JEM will now be described. OBMC has been used for the first generations of video standards, as, for example, in H.2 63. In JEM, OBMC runs for all Motion Compensated (MC) block limits, except the right and right limits. bottom of a CU. In addition, OBMC can be applied to both luma and chroma components. In JEM, an MC block corresponds to an encoding block. When a CU is encoded with sub-CU mode (which includes subCU merge, Affine and FRUC mode, as described in J. Chen, E. Alshina, GJ Sullivan, J.-R. Ohm, J. Boyce, Description of the Joint Test Algorithm of Test Model 4, JVETD1001, October 2016), each CU sub-block is an MC block. In order to process CU limits uniformly, OBMC is performed at the sub-block level for all limits of the MC block, where the size of the sub-block is equal to 4x4, as shown in Figures 9A and 9B.
[0111] When the OBMC applies to the current sub-block, in addition to the current motion vectors, the motion vectors of four connected neighboring sub-blocks, if available and not identical to the current motion vector, are also used to derive the prediction block for the current sub-block. These various prediction blocks based on
Petition 870190061414, of 07/01/2019, p. 53/124
47/78 in various motion vectors are combined to generate the final prediction signal for the current sub-block.
[0112] As shown in Figure 10, the prediction block based on motion vectors of a neighboring subblock is denoted as P N , with N indicating an index for the adjacent sub-blocks above, below, left and right, and the prediction block based on motion vectors of the current sub-block is denoted as P c . When P N is based on the movement information of a neighboring sub-block that contains the same movement information for the current sub-block, OBMC is not executed from P N. Otherwise, each pixel of P N is added to the same pixel in P c , that is, four rows / columns of P N are added to P c . Weighting factors {1/4, 1/8, 1/16, 1/32} are used for P N and weighting factors {3/4, 7/8, 15/16, 31/32} are used for P c . The exception is small MC blocks (that is, when the height or width of the coding block is equal to 4 or a CU is coded with the sub-CU mode), to which only two rows / columns of P N are added to the Pc. In this case, the weighting factors {1/4, 1/8} are used for P N and the weighting factors {3/4, 7/8} are used for P c . For the P N generated based on vertically (horizontally) neighboring sub-block movement vectors, pixels on the same PN line (column) are added to the P c with the same weighting factor. BIO can also be applied for the derivation of the prediction block P N.
[0113] In JEM, for a CU with size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether the OBMC is applied or not
Petition 870190061414, of 07/01/2019, p. 54/124
48/78 for the current CU. For UCs with a size greater than 256 luma samples or not encoded with the AMVP mode, OBMC is applied by default. In the encoder, when the OBMC is applied to a CU, its impact is taken into account during the motion estimation stage. The prediction signal, which uses movement information from the upper neighboring block and the left neighboring block, is used to compensate for the upper and left limits of the original current CU signal, and then the normal motion estimation process is applied.
[0014] While the BIO potentially provides more than 1% reduction in the Jpntegaard-Delta B bit rate (BD-rate) in JEM4.0, the BIO also potentially introduces significant computational complexity and may need to increase the width of memory band for both the encoder and the decoder. This disclosure describes techniques that can potentially reduce the computational complexity and required memory bandwidth associated with BIO. As an example, according to the techniques of this disclosure, a video encoder can determine a quantity of a BIO movement, such as, for example, the values vx and vy described above, at a sub-block level and use that determined quantity of BIO motion to modify sample values of a predictive block on a sample-by-sample basis. Therefore, the techniques of this disclosure can improve video encoders and video decoders, allowing them to achieve the gains of BIO encoding without incurring the substantial processing and memory costs required for
Petition 870190061414, of 07/01/2019, p. 55/124
49/78 existing BIO implementations.
[0115] Based on equation (4), this disclosure introduces techniques to reduce the complexity of the BIO by redefining the Ω window. Such techniques can, for example, be performed by the video encoder 20 (such as, for example, motion estimation unit 42 and / or motion compensation unit 44) or by video decoder 30 (as, for example, motion unit). motion compensation 72). Window Ω is defined as any block within the current block that covers the current pixel with size MxN, where M and N is any positive integer. In one example, the current block is divided into non-overlapping sub-blocks and the Ω window is defined as the sub-block covering the current pixel. In another example, as shown in Figure 11, the sub-block is defined as the smallest block for storage of the motion vector that covers the current pixel. In HEVC and JEM, the smallest block size is 4x4. In another example, the window size Ω is adaptive according to the encoding information, such as the current block size, encoding modes. When the current block size is larger, a larger window Ω can be used. When the current block is coded as a sub-block mode, such as sub-CU fusion, Affine and FRUC mode, window Ω is defined as the sub-block.
[0116] Figure 11 shows an example of the proposed BIO for an 8x4 block, according to the techniques of this disclosure, with a window Ω for pixels A, B and C. According to the techniques of this disclosure, equal weightings can be used to solve vx and vy, as shown in equation (7). In another example, weightings
Petition 870190061414, of 07/01/2019, p. 56/124
Different 50/78 can be used to solve vx and vy, as shown in equation (10). Uneven weights can be a function of the distances between the central pixel and the associated pixels. In yet another example, weighting can be calculated using a bilateral approach, as described in https://en.wikipedia.org/wiki/Bilateral_filter. In addition, lookup tables can be used to store all weighting factors for each pixel for the window Ω in equation (7).
[0117] In another example, when deriving PN for OBMC, the BIO is executed only for partial pixels when deriving the predictors using the neighboring movements. In one example, BIO is completely disabled for all pixels in the PN derivation. In yet another example, BIO is applied only to the pixels on the two outer lines, as shown in Figures 12A-12D.
[0118] In addition, for each block, how many BIO lines are applied can be explicitly flagged at the SPS / PPS slice level. If the BIO is disabled or partially disabled, it can also be explicitly flagged at the SPS / PPS slice level.
[0119] On the other hand, how many BIO lines are applied can be implicitly based on certain coding conditions, such as CU mode (subblock or non-subblock mode) or block size or the combination of other tools, such as signaling Lighting Compensation (IC) signaled. If the BIO is disabled or partially disabled it can also be implicitly derived based on certain conditions, such as the
Petition 870190061414, of 07/01/2019, p. 57/124
51/78 CU mode (sub-block or non-sub-block mode) or block size or the combination of other tools, such as signaled IC flag.
[0120] Figures 12A-12D show examples of simplified BIO proposed in OBMC according to the techniques of this disclosure, in which x represents the derived predictor without BIO and o represents the derived predictor with ΒΙΟ. The refinement of the motion vector from the BIO can be based on blocks. Regardless of the size of the M-by-N block, a weighting function can be used to provide different scaling factors for pixels from different locations when calculating the terms in equation (7). When solving equations (5) and (6), the interpellated pixels and their gradient values gathered from the entire block can be used to solve vx and vy together, instead of solving vx and vy individually for each pixel position .
[0121] In one example, the window size Ω can be defined as an operating window centered on each pixel location, and the average value is used by summing the values of all locations. Specifically, írfj | r, j λ. u [/, 1¾ + o [ϊ ',. / Ηΐ
(11) where N is the number of pixels in each sub-block and is the
Petition 870190061414, of 07/01/2019, p. 58/124
52/78 window defined for each pixel. In one example, Q k can be the 5x5 window defined in the current BIO design for each pixel and, therefore, the weighting function can be determined in advance. An example of the weighting function used for the 4x4 sub-block with 5x5 window is shown in Figure 13. Figure 13 shows an example of a weighting function for a 4x4 sub-block with a 5x5 window.
[0122] In another example, the weighting function can be sent in SPS, PPS or slice header. To reduce signaling costs, a set of predefined weighting functions can be stored and only the weighting function indexes need to be signaled.
[0123] In another example, the refined motion vector can be found using pixels located in the central part of the sub-block. The gradient values of the central pixels can be calculated using the interpellation filter and a window of size M-by-N can be applied to the interpellated pixels to provide different weights to the central pixels, in order to calculate the variables sl-s6 in the equation ( 7). In one example, the gradient values of the central pixels can be calculated and the average value of the central pixels can be used (equal weight window). In another example, a median filter can be used to select representative pixels to calculate variables sl-s6 in equation (7).
[0124] In JVET-D0042, when resolving by displacement (s) ΒΙΟ, the window size for each pixel can be modified to be the entire current block, which
Petition 870190061414, of 07/01/2019, p. 59/124
53/78 potentially adds computational complexity to the current design when a current block is greater than or equal to 8x4. The worst case of the modifications is that a 128x128 window is used for the accumulation of gradients and predictors for each pixel within a 128x128 block.
[0125] In addition, when sub-blocks within a CU share the same MV or an inter-coded CU is divided into smaller sub-blocks for motion compensation (MC), the JEM-4.0 provides the flexibility to or execute the MC and BIO for each sub-block in parallel or execute MC and BIO for the largest aggregated block of the sub-blocks with the same MV in a time effort. Either way, JEM-4.0 provides identical coding results. However, the modified BIO in JVET-D0042 uses a block size dependent gradient calculation and weighting factors such that running MC and BIO for two similar neighboring motion blocks, together or separately, can lead to different results . To avoid different results, it must be specified that the decoder must execute MC and BIO at the block level or at a certain sub-block level. Such a restriction can be very strict and not desirable for the practical implementation of codecs [0126] Based on equation (4), the complexity of the BIO can be further reduced by redefining the window Ω. Two types of window Ω are defined; one is the non-overlapping window and the other is a sliding window. For the non-overlapping window type, the current block is divided into non-overlapping subblocks and the Ω window is defined as the subblock that covers the current pixel, as shown in Figure
Petition 870190061414, of 07/01/2019, p. 60/124
54/78
11. For the sliding window type, window Ω is defined as a block centered on a current pixel, as shown in Figure 7.
[0127] For both types of window Ω, the size of the window Ω can be determined using different methods, as shown below. From then on, it can be assumed that window Ω is a rectangular block with size MxN, where M and N can be any non-negative integer, such as (4x4, 8x8, 16x16, 8x4 and so on). The Ω window is not limited to a rectangular shape and can have any other shape, traditional as a diamond shape. The techniques described can also be applied to shapes other than the rectangular shape, if applicable.
[0128] The window size can be fixed or variable and can be predetermined or signaled in the bit stream. When the size is signaled, the size can be signaled in the sequence parameter set (SPS), in the image parameter set (PPS), in the slice header or in the CTU level. The window size can be determined together by the size of the motion compensated (MC) block by the equation below.
Horizontal window size M = min (M, MC_Size);
Vertical window size N = min (N, MC_Size).
[0129] In one example, the motion compensated (MC) block is purely dependent on the encoding information, such as the current block size and the encoding modes. For example, the motion compensated block (MC) is defined as the entire CU when the current CU is encoded with modes without a sub-block, such as blending
Petition 870190061414, of 07/01/2019, p. 61/124
55/78 sub-CU, Affine and FRUC mode. The motion-compensated block (MC) is configured as a sub-block when using sub-block modes, such as merging subCU, Affine and FRUC, regardless of whether the sub-blocks have the same movement information.
[0130] In another example, the motion compensated block (MC) is defined as the sample block within a CU that has the same MVs. In this case, the motion compensated block (MC) is defined as the entire CU when the current CU is encoded with non-subblock modes, such as the sub-CU, Affine and FRUC merge mode. When a CU is encoded with sub-block modes, such as sub-CU merge, Affine and FRUC mode, subblocks with the same motion information are merged as a motion compensated (MC) block with a given scan order. sub-block.
[0131] Adaptive size: the window size Ω is adaptive according to the encoding information, such as the current block size, encoding modes. In one example, window Ω is configured as the entire current block or fourth of the current block when the current block is coded as non-subblock modes, such as sub-CU merging, Affine and FRUC mode; and window Ω is configured as a sub-block when the current block is coded as a sub-block mode. The size of the adaptive window can be determined jointly by the size of the motion compensated block (MC) by the equation below.
Horizontal window size M = me (M, MC_Size);
Vertical window size N = me (N, MC_Size).
[0132] For the different techniques to determine
Petition 870190061414, of 07/01/2019, p. 62/124
56/78 the size of the window Ω, a high level limitation of the size can be included for friendly hardware or software implementation. For example, the window size must be less than or equal to the maximum size of the Transform Unit (TU) allowed in the video codec system. In another example, the window size must be greater than or equal to the smallest MC block, such as 4x4.
[0133] To further simplify BIO-related operations, this disclosure introduces techniques for running the BIO as post-processing, after the end of all motion compensation prediction. To be specific, after the completion of the conventional MC, the OBMC can be applied to generate better predictors for the current block. Based on the final predictor, the BIO is then applied using the current block's movement information to further refine the predictor. For example, for the gradient calculation in BIO, the motion of the entire block can be used. In another example, for each sub-block, the average OBMC motion vector can be used. In another example, for each subblock, the median motion vector (for each dimension individually) can be used.
[0134] The weighting functions can be designed differently when considering the block-based derivation of the BIO motion vector refinement. Equal weights can be used for any of the methods mentioned above. Alternatively, more weights can be placed towards the central part of the window. In one example, weights can be calculated by the inverse distance (which includes, but is not
Petition 870190061414, of 07/01/2019, p. 63/124
57/78 is limited to, LI standard or L2 standard) between the center of the window and the pixel.
[0135] Figure 8 is a block diagram showing an example of a video encoder 20 that can implement techniques for bidirectional optical flow. Video encoder 20 can perform intra-encoding and inter-encoding of video blocks within video slices. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given frame or video image. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or images in a video sequence. Intra-mode (mode I) can refer to any of several spatial coding modes. Inter modes, such as one-way prediction (P mode) or bi-prediction (B mode) can refer to any of several time coding modes.
[0136] As shown in Figure 14, video encoder 20 receives video data and stores received video data in video data memory 38. Video data memory 38 can store video data to be encoded by components video encoder 20. The video data stored in the video data memory 38 can be obtained, for example, from the video source i8. The reference image memory 64 can be a reference image memory that stores reference video data for use in encoding video data by video encoder 20, for example, in intra or inter-encoding modes. The video data memory 38
Petition 870190061414, of 07/01/2019, p. 64/124
58/78 and reference image memory 64 can be formed by any of a number of memory devices, such as dynamic random access memory (DRAM), which includes synchronous DRAM (SDRAM), magnetoristive RAM (MRAM), RAM resistive (RRAM) or other types of memory devices. The video data memory 38 and reference image memory 64 can be provided by the same memory device or separate memory devices. In several examples, the video data memory 38 may be on the chip with other components of the video encoder 20, or without a chip with respect to those components.
[0137] Video encoder 20 receives a current video block receives the current video block within a video frame to be encoded. In the example in Figure 14, video encoder 20 includes a mode selection unit 40, a reference frame memory 64 (which can also be referred to as a decoded image store (DPB)), an adder 50, a unit transform processing unit 52, a quantization unit 54 and an entropy coding unit 56. The mode selection unit 40, in turn, includes a motion compensation unit 44, a motion estimation unit 42, a intra-prediction unit 46, and a partition unit 48. For video block reconstruction, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60 and an adder 62. An unlock filter (not shown in Figure 14) can also be included to filter boundaries between blocks for the purpose of removing reconstructed video block artifacts. If
Petition 870190061414, of 07/01/2019, p. 65/124
59/78 desired, the unblocking filter would typically filter the output of adder 62. Additional filters (in the mesh or post mesh) can also be used in addition to the unblocking filter. Such filters are not shown for the sake of brevity, but, if desired, they can filter the output of adder 50, (like a mesh filter).
[0138] During the encoding process, the video encoder 20 receives a frame or video slice to be encoded. The frame or slice can be divided into several blocks of video. The motion estimation unit 42 and the motion compensation unit 44 perform the inter-predictive encoding of the received video block with respect to one or more blocks in one or more reference frames in order to obtain temporal prediction. The intra-prediction unit 46 can alternatively perform the intra-predictive encoding of the received video block with respect to one or more neighboring blocks in the same frame or slice of the block to be encoded in order to obtain spatial prediction. The video encoder 20 can perform several encoding passes, such as, for example, to select the appropriate condition mode for each block of video data.
[0139] Furthermore, partition unit 48 can partition blocks of video data into sub-blocks, based on the evaluation of previous partitioning schemes in previous encoding passages. For example, partition unit 48 can initially partition a frame or slice into LCUs and partition each of the LCUs into sub-CUs based on rate distortion analysis (such as rate distortion optimization).
Petition 870190061414, of 07/01/2019, p. 66/124
60/78
The mode selection unit 40 can also produce a quad-tree data structure that indicates the partitioning of an LCU into sub-CUs. Quad-tree leaf node CUs can include one or more PUs and one or more TUs.
[0140] The mode selection unit 40 can select one of the prediction modes, intra or inter, as, for example, based on error results, and provides the predictive block for the adder 50 to generate residual data and for the adder 62 to reconstruct the coded block for use as a frame of reference. The mode selection unit 40 also provides syntax elements, such as motion vectors, intramode indicators, partition information and other syntax information, for the entropy coding unit 56.
[0141] The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are shown separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a PU from a video block within the current frame or video image with respect to a predictive block within a reference frame (or other encoded unit) with respect to the block current that is encoded within the current frame (or another encoded unit). A predictive block is a block that is found to closely match the block to be coded, in terms of pixel difference, which can be determined by
Petition 870190061414, of 07/01/2019, p. 67/124
61/78 sum of the absolute difference (SAD), by the sum of the squared difference (SSD) or by other difference metrics. In some examples, video encoder 20 can calculate values for integer sub-pixel positions of reference images stored in reference frame memory 64. For example, video encoder 20 can interpolate quarter position values pixel, eighth pixel positions or other fractional pixel positions of the reference image. Therefore, motion estimation unit 42 can perform a motion search with respect to complete pixel positions and fractional pixel positions and transmit a motion vector with fractional pixel precision.
[0142] The motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-encoded slice by comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from a first list of reference images (List 0) or a second list of reference images (List 1), each of which identifies one or more reference images stored in the frame memory. reference 64. Motion estimation unit 42 sends the calculated motion vector to entropy coding unit 56 and motion compensation unit 44.
[0143] Motion compensation performed by motion compensation unit 44 may involve searching or generating the predictive block based on the motion vector determined by motion estimation unit 42. Again, the motion estimation unit 42
Petition 870190061414, of 07/01/2019, p. 68/124
62/78 movement 42 and movement compensation unit 44 can be functionally integrated, in some examples. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block that the motion vector indicates in one of the reference image lists. The adder 50 forms a residual video block by subtracting the pixel values from the predictive block from the pixel values of the current video that is encoded, forming pixel difference values, as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to luma components, and motion compensation unit 44 uses motion vectors calculated based on luma components for both chroma and luma components. The mode selection unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by the video decoder 30 in decoding the video blocks of the video slice.
[0144] In addition, motion compensation unit 44 can be configured to perform any or all of the techniques in this disclosure (alone or in any combination). While discussed in relation to motion compensation unit 44, it should be understood that the mode selection unit 40, motion estimation unit 42, partition unit 48 and / or entropy coding unit 56 can also be configured to perform certain techniques of this disclosure, alone or in combination with the motion compensation unit 44. In one example, the motion compensation unit
Petition 870190061414, of 07/01/2019, p. 69/124
63/78 movement 44 can be configured to perform the BIO techniques discussed here.
[0145] The intraprediction processing unit 46 can intra-predict or calculate the current block, as an alternative to the inter-prediction performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intra-prediction unit 46 can determine the intra-prediction mode to be used to encode the current block. In some examples, the intraprediction unit 46 can encode the current block using various intraprediction modes, such as during separate coding passages, and intraprediction 46 (or the mode selection unit 40, in some examples ) can select an appropriate intra-prediction mode to be used from the tested modes.
[0146] For example, intra-prediction processing unit 46 can calculate rate distortion values using rate distortion analysis for the various tested intra prediction modes, and select the intra prediction mode that has the best characteristics rate distortion between the tested modes. Rate distortion analysis generally determines the degree of distortion (or error) between an encoded block and an original uncoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits ) used to produce the coded block. The intra-prediction processing unit 46 can calculate ratios from the distortions and rates for the various coded blocks in order to determine what the
Petition 870190061414, of 07/01/2019, p. 70/124
64/78 intra-prediction mode presents the best rate distortion value for the block.
[0147] After selecting an intraprediction mode for a block, the intraprediction unit 46 can provide information that indicates the intraprediction mode selected for the block to the entropy coding unit 56. The entropy coding unit 56 can encode information that indicates the selected intraprediction mode. Video encoder 20 may include configuration data in the transmitted bit stream, which may include a series of prediction mode index tables and a series of modified intrapredition mode tables (also referred to as codeword mapping tables. ) definitions of coding contexts for various blocks and indications of the most likely intra-prediction mode, an intra-prediction mode index table and a modified intra-prediction mode index table to be used for each of the contexts.
[0148] The video encoder 20 forms a residual video block by subtracting the prediction data from the mode selection unit 40 from the original video block that is encoded. The adder 50 represents the component or components that perform this subtraction operation. The transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising values of residual transform coefficients. Transformer processing unit 52 can perform
Petition 870190061414, of 07/01/2019, p. 71/124
65/78 other transforms that are conceptually similar to DCT. Wavelet transforms, integer transforms, subband transforms or other types of transforms can also be used. In any event, the transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients. The transform can convert residual information from a pixel value domain to a transform domain, such as a frequency domain. The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantifies the transform coefficients in order to further reduce the bit rate, the quantization process can reduce the depth of bits associated with some or all of the coefficients. The degree of quantification can be modified by adjusting a quantization parameter.
[0149] After quantification, the entropy coding unit 56 entropy codes the quantized transform coefficients. For example, the entropy coding unit 56 can perform context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), entropy coding with probability interval partitioning (PIPE) or other entropy coding technique. In the case of context-based entropy coding, the context can be based on neighboring blocks. After
Petition 870190061414, of 07/01/2019, p. 72/124
66/78 entropy coding by the entropy coding unit 56, the encoded bit stream can be transmitted to another device (the video decoder 30, for example) or archived for later transmission or retrieval.
[0150] Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain. In particular, adder 62 adds the reconstructed residual block to the motion compensated prediction block produced by the motion compensation unit 44 or the intraprediction processing unit 46 to produce a reconstructed video block for storage in the reference frame memory 64. The reconstructed video block can be used by motion estimation unit 42 and motion compensation unit 44 as a reference block for inter-encoding a block in a subsequent video frame.
[0151] Figure 15 is a block diagram illustrating an example of a video decoder 30 that can implement techniques for bidirectional optical flow. In the example in Figure 15, the video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intraprediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a memory reference frames 82 and an adder 80. The video decoder 30 may, in some instances, perform a pass-through
Petition 870190061414, of 07/01/2019, p. 73/124
67/78 decoding generally complementary to the described encoding pass with respect to video encoder 20 (FIG. 14). The motion compensation unit 72 can generate prediction data based on the motion vectors received from the entropy decoding unit 70, while the intra prediction unit 74 can generate prediction data based on the intra prediction mode indicators. received from the entropy decoding unit 70.
[0152] During the decoding process, the video decoder 30 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements from the video encoder 20. The video decoder 30 stores the encoded video bit stream received in the video data memory 68. The video data memory 68 can store video data, such as an encoded video bit stream, to be decoded by the video decoder components 30. The stored video data
in the memory in Dice of video 68 may to be obtained, per example, through a way liable in reading per computer 16, middle storage, or in a source in
local video, such as a camera, or accessing a physical data storage medium. The video data memory 85 can form an encoded image (CPB) store that stores encoded video data from an encoded video bit stream. The reference image memory 82 can be a reference image memory that stores reference video data for use in decoding video data by the video decoder 30, for example, in intra or inter-encoding modes. The data memory of
Petition 870190061414, of 07/01/2019, p. 74/124
68/78 video 68 and reference image memory 82 can be formed by any of a number of memory devices, such as DRAM, SDRAM, MRAM, RRAM or other types of memory devices. The video data memory 68 and the reference image memory 82 can be provided by the same memory device or separate memory device. In several examples, the video data memory
68 may be at the chip with others components of decoder video 30, or off the chip relative The such components.[0153] During the process of decoding, O decoder video 30 receives a stream video bit
encoded representing video blocks of an encoded video slice and related syntax elements of the video encoder 20. The entropy decoding unit 70 of the video decoder 30 entropy decodes the bit stream so as to generate quantized coefficients, vectors of movement or division indicators of intra-prediction mode and other elements of syntax. The entropy decoding unit 70 outputs the motion vectors and other syntax elements to the motion compensation unit 72. The video decoder 30 can receive the syntax elements at the video slice level and / or the block level of video.
[0154] When the video slice is encoded as an intra-encoded slice (I), the intraprediction unit 7 4 can generate prediction data for a video block of the current video slice based on the signaled intraprediction mode and on block data previously decoded from the current frame or image. When the
Petition 870190061414, of 07/01/2019, p. 75/124
69/78 video is encoded as an inter-encoded slice (B, P or GPB, for example), the motion compensation unit 72 produces predictive blocks for a video block of the current video slice based on the motion vectors and in other syntax elements received from the entropy decoding unit 70. Predictive blocks can be produced from one of the reference images within one of the reference image lists. The video decoder 30 can build the reference frame lists, List 0 and List 1, using predefined construction techniques based on reference images stored in the reference frame memory 82.
[0155] Motion compensation unit 72 determines prediction information for a video block from the current video source by parsing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block that is decoded. For example, the motion compensation unit 72 uses some of the syntax elements received to determine a prediction mode (intra- or inter-prediction, for example) used to encode the video blocks of the video slice, the type of slice inter-prediction (slice B, slice P or GPB slice, for example), construction information for one or more of the reference image lists for the slice, motion vectors for each intercoded video block of the slice, the condition of inter-prediction for each slice's inter-encoded video block and other information to decode the video blocks in the current video slice.
Petition 870190061414, of 07/01/2019, p. 76/124
70/78 [0156] The motion compensation unit 72 can also perform interpellation based on interpellation filters. The motion compensation unit 72 can use the interpellation filters used by the video encoder 20 during the encoding of the video blocks to calculate interpellated values for integer sub-pixels of reference blocks. In this case, the motion compensation unit 72 can determine the interpellation filters used by the video encoder 20 from the received syntax elements and use the interpellation filters to produce predictive blocks.
[0157] In addition, motion compensation unit 72 can be configured to perform any or all of the techniques of this disclosure (alone or in any combination). For example, the motion compensation unit 72 can be configured to perform the BIO techniques discussed here.
[0158] The inverse quantization unit 76 quantifies by inversion, such as, for example, decantifies the quantized transform coefficients presented in the bit stream and decoded by the entropy decoding unit 70. The reverse quantification process may include the use of a quantification parameter QPY calculated by the video decoder 30 for each video block in the video slice in order to determine the degree of quantification and, in the same way, the degree of inverse quantization that must be applied.
[0159] The reverse transform unit 78 applies an inverse transform, such as a DCT
Petition 870190061414, of 07/01/2019, p. 77/124
71/78 inverse, an inverse integer transform or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
[0160] After the motion compensation unit 72 generates the predictive block for the current video block based on the motion vectors and other syntax elements, the video decoder 30 forms a decoded video block adding the residual blocks from the reverse transform unit 78 to the corresponding predictive blocks generated by the motion compensation unit 72. The adder 80 represents the component or components that perform this sum operation. If desired, an unlock filter can also be applied to filter the decoded blocks in order to remove blocking artifacts. Other mesh filters (either in the coding loop or after the coding loop) can also be used to smooth the transitions between pixels or otherwise improve the video quality. The video blocks decoded in a given frame or image are then stored in the reference frame memory 82 which stores the reference images used in the subsequent motion compensation. The reference image memory 82 also stores decoded video for later display on a display device, such as the display device 32 of Figure 1. For example, reference image memory 82 can store decoded images.
[0161] Figure 16 is a flow chart illustrating an example of operation of a video decoder for
Petition 870190061414, of 07/01/2019, p. 78/124
72/78 decode video data according to a technique of this development. The video decoder described in relation to Figure 16 can, for example, be a video decoder, such as the video decoder 30, for the production of displayable decoded video or it can be a video decoder implemented in a video encoder, such as the video encoder decoding loop 20, which includes an inverse quantization unit, an inverse transformation processing unit 60, an adder 62 and a reference image memory 64, as well as parts of the video selection unit. mode 40.
[0162] According to the techniques of Figure 16, the video decoder determines that a block of video data is encoded using an inter bidirectional prediction mode (200). The video decoder determines a first motion vector for the block that points to a first reference image (202). The video decoder determines a second MV for the block that points to a second reference image, the first reference image being different from the second reference image (204). The video decoder uses the first MV to locate a first predictive block in the first reference image (206). The video decoder uses the second MV to locate a second predictive block in the second reference image (208).
[0163] The video decoder determines a first amount of BIO motion for a first subblock of the first predictive block (210). The first subblock can be different from a coding unit, a prediction unit and a transform unit for the
Petition 870190061414, of 07/01/2019, p. 79/124
73/78 block. To determine the first amount of motion ΒΙΟ, the video decoder can, in some instances, determine the first amount of BIO motion based on samples in the first sub-block and samples outside the first sub-block, and in other examples, determine the first amount of BIO movement based only on samples in the first sub-block. The first amount of BIO motion can include, for example, a motion vector field that includes a horizontal component and a vertical component.
[0164] The video decoder determines a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first BIO motion value (212). To determine the first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, the first sub-block of the second predictive block, and the first amount of motion ΒΙΟ, the video decoder you can determine the first final predictive sub-block using, for example, equation (2) above.
[0165] The video decoder determines a second amount of BIO motion for a second subblock of the first predictive block (214). The second subblock can be different from a coding unit, a prediction unit and a transform unit for the block. To determine the second amount of motion ΒΙΟ, the video decoder may, in some instances, determine the second amount of BIO motion based on samples in the second sub-block and samples outside the second
Petition 870190061414, of 07/01/2019, p. 80/124
74/78 sub-block, and in another example, determine the second amount of BIO movement based only on samples in the second sub-block. The second amount of motion BIO can, for example, include a motion vector field that includes a horizontal component and a vertical component.
[0166] The video decoder determines a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second BIO motion value (216). The video decoder can, for example, determine the second final predictive sub-block using, for example, equation (2) to determine the second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second volume of BIO movement.
[0167] The video decoder determines a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block (218). The video decoder can, for example, add residual data to the final predictive block to determine a reconstructed block for the video data block. The video decoder can also perform one or more filtering processes on the reconstructed block of video data.
[0168] The video decoder produces an image of video data comprising a decoded version of the video data block (220). When decoding is performed as part of a
Petition 870190061414, of 07/01/2019, p. 81/124
75/78 decoding a video encoding process, the video decoder can, for example, produce the image by storing the image in a reference image memory and the video decoder can use the image as a reference image in the encoding of another image of the video data. When the video decoder is a video decoder configured to produce displayable decoded video, then the video decoder can, for example, send the image of the video data to a display device.
[0169] It must be recognized that, depending on the example, certain acts or events of any of the techniques described here can be performed in a different sequence, can be added, merged or left out altogether (not all the acts or events described necessary for the practice of techniques, for example). Furthermore, in certain examples, acts or events can be performed concurrently, such as, for example, through processing with several execution flows, processing with interrupts or several processors and not sequentially.
[0170] In one or more examples, the functions described can be implemented in hardware, software, firmware or any combination of them. If implemented in software, the functions can be stored or transmitted as one or more instructions in code in a medium that can be read by a computer and executed by a hardware-based processing unit. Computer-readable means may include means of
Petition 870190061414, of 07/01/2019, p. 82/124
76/78 computer-readable storage that corresponds to a tangible medium such as a data storage medium or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another, according to a communication protocol, for example. In this way, computer-readable media can generally correspond to (1) tangible computer-readable storage media that are non-transitory or (2) a communication medium such as a signal or carrier wave. The data storage means can be any available means that can be accessed by one or more processors or one or more processors to retrieve instructions, code and / or data structures for implementing the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[0171] By way of example, and not by way of limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices , flash memory or any other means that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is appropriately called a computer-readable medium. For example, if instructions are transmitted from a website, server, or other source
Petition 870190061414, of 07/01/2019, p. 83/124
77/78 remote using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio and microwave, then the coaxial cable, the fiber optic cable, the twisted pair to DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transient media, but are instead directed to tangible non-transient storage media. Disc (disk and disc, in the original) as used here includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc and blu-ray disc, where discs (disks) usually reproduce data magnetically , while discs reproduce data optically with lasers. Combinations of these elements should be included within the scope of the computer-readable means.
[0172] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrangements (FPGAs) or other equivalent integrated or discrete logic circuits. Therefore, the term processor used herein can refer to any of the foregoing structures or to any other suitable structure for implementing the techniques described herein. In addition, in some respects, the functionality described here can be
Petition 870190061414, of 07/01/2019, p. 84/124
78/78 presented within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into a combined CODEC. In addition, the techniques can be fully implemented in one or more circuits or logic elements.
[0173] The techniques of this disclosure can be implemented in a wide variety of devices or equipment, including a wireless telephone device, an integrated circuit (IC) or a set of ICs (a set of chips, for example). Several components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the revealed techniques, but do not necessarily require execution by different hardware units. Instead, as described above, multiple units can be
combined in an unity in hardware in CODEC or provided per an collection in units in hardware interoperable, what include one or more processors described above, in set with software and / or firmware
appropriate.
[0174] Several examples have been described.
These and other examples are within the scope of the following claims.
权利要求:
Claims (12)
[1]
1. Method of decoding video data, comprising:
determining that a block of video data is encoded using an inter bidirectional prediction mode;
determine a first movement vector (MV) for the block, in which the first MV points to a first reference image;
determine a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image;
use the first MV, which locates a first predictive block in the first reference image;
use the second MV, which locates a second predictive block in the second reference image;
determining a first amount of bidirectional optical flow (BIO) movement for a first subblock of the first predictive block;
determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO movement;
determining a second amount of BIO movement for a second sub-block of the first predictive block;
determine a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the
Petition 870190061414, of 07/01/2019, p. 86/124
[2]
A method according to claim 1, in which determining the first amount of BIO movement comprises determining the first amount of BIO movement based on samples in the first sub-block and samples outside the first sub-block.
2/12 second predictive block and the second amount of BIO movement;
determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and producing an image of video data comprising a decoded version of the video data block.
[3]
12/3
Method according to claim 1, in which determining the first amount of BIO movement comprises determining the first amount of BIO movement based only on samples in the first sub-block.
[4]
4/12 reference, and
Ti comprises a distance to the second reference image.
Method according to claim 1, in which determining the second amount of BIO motion comprises determining the second amount of BIO motion based on samples in the second sub-block and samples outside the second sub-block.
[5]
5/12 determine a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image;
use the first MV, which locates a first predictive block in the first reference image;
use the second MV, which locates a second predictive block in the second reference image;
determining a first amount of bidirectional optical flow (BIO) movement for a first subblock of the first predictive block;
determining a first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, a first sub-block of the second predictive block and the first amount of BIO motion;
determining a second amount of BIO movement for a second sub-block of the first predictive block;
determining a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion;
determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and producing an image of video data comprising a decoded version of the video data block.
13. Apparatus according to claim 12,
Petition 870190061414, of 07/01/2019, p. 90/124
A method according to claim 1, in which determining the second amount of BIO movement comprises determining the second amount of BIO movement based only on samples in the second sub-block.
[6]
6/12 in which to determine the first amount of BIO movement, the one or more processors are configured to determine the first amount of BIO movement based on samples in the first sub-block and samples outside the first sub-block.
Apparatus according to claim 12, in which to determine the first amount of BIO movement, the one or more processors are configured to determine the first amount of BIO movement based only on samples in the first sub-block.
Apparatus according to claim 12, in which to determine the second amount of BIO motion, the one or more processors are configured to determine the second amount of BIO motion based on samples in the second sub-block and samples outside the second sub-block.
Apparatus according to claim 12, in which to determine the second amount of BIO motion, the one or more processors are configured to determine the second amount of BIO motion based only on samples in the second sub-block.
Apparatus according to claim 12, in which the first amount of motion BIO comprises a motion vector field comprising a horizontal component and a vertical component.
Apparatus according to claim 12, in which the first sub-block is different from a coding unit, a prediction unit and a transform unit for the block.
19. Apparatus according to claim 12,
Petition 870190061414, of 07/01/2019, p. 91/124
A method according to claim 1, in which the first amount of motion BIO comprises a motion vector field comprising a horizontal component and a vertical component.
Petition 870190061414, of 07/01/2019, p. 87/124
[7]
7/12 in which the one or more processors are configured to:
add residual data to the final predictive block to determine a reconstructed block for the video data block.
Apparatus according to claim 12, in which to determine the first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, the first sub-block of the second predictive block , and the first amount of BIO movement, the one or more processors are configured to determine the first final predictive sub-block according to the equation:
in which
pred B io understands a value of sample of first sub-block final predictive; I <0) comprises a value of sample of first sub-block of the first block I (I) comprises a predictive;value of sample of first
sub-block of the second predictive block;
V x comprises a horizontal component of the first amount of motion BIO;
V y comprises a vertical component of the first amount of motion BIO;
To understand an distance up until the first image in reference, and Ti understands an distance up until the second image in reference. 21. Appliance, in according to claim 12,
Petition 870190061414, of 07/01/2019, p. 92/124
Method according to claim 1, in which the first sub-block is different from a coding unit, a prediction unit and a transform unit for the block.
[8]
8/12 in which the one or more processors decode the video data as part of a decoding loop of a video encoding process, and in which to produce images of video data that comprise the decoded version of the video data block video, the one or more processors are configured to store the image of
video data that comprises the decoded version of data block video on a memory of Image in reference, in which one or more s processors They are also configured for: use the image of data from video what
it comprises the decoded version of the video data block as a reference image in the encoding of another image of the video data.
Apparatus according to claim 12, in which, producing the video data image comprising the decoded version of the video data block, the one or more processors are configured to produce the video data image comprising the decoded version of the video data block for a display device.
Apparatus according to claim 12, in which a wireless communication apparatus also comprises a receiver configured to receive encoded video data.
Apparatus according to claim 23, in which the wireless communication device comprises a telephone and in which the receiver is configured to demodulate, according to a wireless communication standard, a signal comprising the video data coded.
25. Apparatus according to claim 12,
Petition 870190061414, of 07/01/2019, p. 93/124
8. Method according to claim 1, which also comprises:
add residual data to the final predictive block to determine a reconstructed block for the video data block.
[9]
9/12 in which a wireless communication device also comprises a transmitter configured to transmit encoded video data.
26. Apparatus according to claim 25, in which the wireless communication device comprises a telephone, and in which the transmitter is configured to modulate, according to a wireless communication standard, a signal comprising the data of encoded video.
27. Computer-readable storage medium that stores instructions that, when executed
by one or more processors, make processors: with that one or more video data be determine that a block of encoded using a way in prediction inter bidirectional; determine one first vector of movement (MV)
for the block, in which the first MV points to a first reference image;
determine a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image;
use the first MV, which finds a first predictive block in the first reference image;
use the second MV, which finds a second predictive block in the second reference image;
determine a first amount of bidirectional optical flow (BIO) movement for a first sub-block of the first predictive block;
determine a first final predictive sub-block
Petition 870190061414, of 07/01/2019, p. 94/124
9. Method according to claim 1, in which determining the first final predictive sub-block for the video data block based on the first sub-block of the first predictive block, the first sub-block of the second predictive block, and the first amount of BIO movement comprises determining the first final predictive sub-block according to the equation:

[10]
10/12 for the video data block based on the first subblock of the first predictive block, a first subblock of the second predictive block and the first amount of BIO movement;
determine a second amount of BIO movement for a second sub-block of the first predictive block;
determine a second final predictive sub-block for the video data block based on the second sub-block of the first predictive block, a second sub-block of the second predictive block and the second amount of BIO motion;
determine a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and produce an image of video data that comprises a decoded version of the video data block.
28. A computer-readable storage medium according to claim 27, in which the first amount of motion BIO comprises a motion vector field comprising a horizontal component and a vertical component.
29. A computer-readable storage medium according to claim 27, in which the first sub-block is different from a coding unit, a prediction unit and a transform unit for the block.
30. Equipment for decoding video data, comprising:
a device to determine that a block of
Petition 870190061414, of 07/01/2019, p. 95/124
Method according to claim 1, in which the decoding method is performed as part of a decoding loop of a video encoding process, and in which the output of the video data image comprises the decoded version of the video data block, comprises storing the video data image comprising the decoded version of the video data block in a reference image memory, which also comprises:
using the video data image comprising the decoded version of the video data block as a reference image in encoding another image of the video data.
[11]
11/12 video data is encoded using an inter bidirectional prediction mode;
a device for determining a first motion vector (MV) for the block, in which the first MV points to a first reference image;
a device for determining a second MV for the block, in which the second MV points to a second reference image, the first reference image being different from the second reference image;
a device for locating a first predictive block in the first reference image using the first MV;
a device for locating a second predictive block in the second reference image using the second MV;
a device for determining a first amount of bidirectional optical flow (BIO) movement for a first sub-block of the first predictive block;
a device for determining a first final predictive subblock for the video data block based on the first subblock of the first predictive block, a first subblock of the second predictive block and the first amount of BIO motion;
a device for determining a second amount of BIO motion for a second sub-block of the first predictive block;
a device for determining a second final predictive subblock for the video data block based on the second subblock of the first predictive block, a second subblock of the second predictive block and the second
Petition 870190061414, of 07/01/2019, p. 96/124
Method according to claim 1, in which the output of the video data image, which comprises the decoded version of the video data block, comprises the output of the video data image which comprises the decoded version of the block video data to a display device.
12. Device for decoding video data, which comprises:
determining that a video data block is encoded using an inter bidirectional prediction mode;
determine a first movement vector (MV) for the block, in which the first MV points to a first reference image;
Petition 870190061414, of 07/01/2019, p. 89/124
[12]
12/12 amount of BIO movement;
a device for determining a final predictive block for the video data block based on the first final predictive sub-block and the second final predictive sub-block; and a device for producing an image of video data comprising a decoded version of the video data block.
类似技术:
公开号 | 公开日 | 专利标题
BR112019013684A2|2020-01-28|motion vector reconstructions for bi-directional | optical flow
RU2705428C2|2019-11-07|Outputting motion information for sub-blocks during video coding
BR112019018689A2|2020-04-07|inter-prediction refinement based on bi-directional | optical flow
KR20200020722A|2020-02-26|Memory Bandwidth Efficient Design for Bidirectional Optical Flow |
BR112019019210A2|2020-04-14|restriction motion vector information derived by decoder side motion vector derivation
WO2018200960A1|2018-11-01|Gradient based matching for motion search and derivation
BR112020016133A2|2020-12-08|INTRA-BLOCK COPY FOR VIDEO ENCODING
BR112019017252A2|2020-04-14|deriving motion vector information in a video decoder
BR112020006875A2|2020-10-06|low complexity project for fruc
BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor
BR112019027821A2|2020-07-07|template pairing based on partial reconstruction for motion vector derivation
BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE
BR112020021263A2|2021-01-26|mvp derivation limitation based on decoder side motion vector derivation
同族专利:
公开号 | 公开日
CA3043050A1|2018-07-12|
CN110036638A|2019-07-19|
CO2019007120A2|2019-09-18|
TW201830966A|2018-08-16|
KR20190103171A|2019-09-04|
JP2020503799A|2020-01-30|
US10931969B2|2021-02-23|
WO2018129172A1|2018-07-12|
EP3566441A1|2019-11-13|
AU2018205783A1|2019-05-23|
US20180192072A1|2018-07-05|
CL2019001393A1|2019-09-27|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

EP3332551A4|2015-09-02|2019-01-16|MediaTek Inc.|Method and apparatus of motion compensation for video coding based on bi prediction optical flow techniques|
KR20190018624A|2016-05-13|2019-02-25|브이아이디 스케일, 인크.|Generalized Multi-Hypothesis Prediction System and Method for Video Coding|
CA3025340A1|2016-05-25|2017-11-30|Arris Enterprises Llc|General block partitioning method|
KR102331220B1|2016-12-27|2021-12-01|파나소닉 인텔렉츄얼 프로퍼티 코포레이션 오브 아메리카|Encoding device, decoding device, encoding method, and decoding method|EP3435673A4|2016-03-24|2019-12-25|Intellectual Discovery Co., Ltd.|Method and apparatus for encoding/decoding video signal|
CA3065492A1|2017-05-17|2018-11-22|Kt Corporation|Method and device for video signal processing|
WO2018212111A1|2017-05-19|2018-11-22|パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ|Encoding device, decoding device, encoding method and decoding method|
JPWO2019003993A1|2017-06-26|2019-12-26|パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America|Encoding device, decoding device, encoding method and decoding method|
JPWO2019155971A1|2018-02-06|2021-01-14|パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America|Encoding device, decoding device, coding method and decoding method|
US11109053B2|2018-03-05|2021-08-31|Panasonic Intellectual Property Corporation Of America|Encoding method, decoding method, encoder, and decoder|
WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |
EP3804324A1|2018-06-11|2021-04-14|Mediatek Inc.|Method and apparatus of bi-directional optical flow for video coding|
TWI739120B|2018-06-21|2021-09-11|大陸商北京字節跳動網絡技術有限公司|Unified constrains for the merge affine mode and the non-merge affine mode|
GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|
US11245922B2|2018-08-17|2022-02-08|Mediatek Inc.|Shared candidate list|
BR112021002857A2|2018-08-17|2021-05-11|Mediatek Inc.|Bidirectional prediction video processing methods and apparatus in video encoding systems|
WO2020035029A1|2018-08-17|2020-02-20|Mediatek Inc.|Method and apparatus of simplified sub-mode for video coding|
US11146800B2|2018-09-24|2021-10-12|Tencent America LLC|Low latency local illumination compensation|
WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|
TW202029755A|2018-09-26|2020-08-01|美商Vid衡器股份有限公司|Bi-prediction for video coding|
WO2020084460A1|2018-10-22|2020-04-30|Beijing Bytedance Network Technology Co., Ltd.|Decoder side motion vector derivation in the presence of multi-hypothesis prediction|
WO2020084474A1|2018-10-22|2020-04-30|Beijing Bytedance Network Technology Co., Ltd.|Gradient computation in bi-directional optical flow|
JP2022506161A|2018-11-05|2022-01-17|北京字節跳動網絡技術有限公司|Interpolation for inter-prediction with refinement|
CN113170097A|2018-11-20|2021-07-23|北京字节跳动网络技术有限公司|Coding and decoding of video coding and decoding modes|
WO2020103943A1|2018-11-22|2020-05-28|Beijing Bytedance Network Technology Co., Ltd.|Using collocated blocks in sub-block temporal motion vector prediction mode|
KR20200140373A|2018-11-30|2020-12-15|텐센트 아메리카 엘엘씨|Method and apparatus for video coding|
WO2020125804A1|2018-12-21|2020-06-25|Beijing Bytedance Network Technology Co., Ltd.|Inter prediction using polynomial model|
CN113632484A|2019-03-15|2021-11-09|北京达佳互联信息技术有限公司|Method and apparatus for bit width control of bi-directional optical flow|
CN113596479A|2019-06-21|2021-11-02|杭州海康威视数字技术股份有限公司|Encoding and decoding method, device and equipment|
WO2021054886A1|2019-09-20|2021-03-25|Telefonaktiebolaget Lm Ericsson |Methods of video encoding and/or decoding with bidirectional optical flow simplification on shift operations and related apparatus|
CN112868236A|2019-09-24|2021-05-28|北京大学|Video processing method and device|
WO2020256601A2|2019-10-03|2020-12-24|Huawei Technologies Co., Ltd.|Method and apparatus of picture-level signaling for bidirectional optical flow and decoder side motion vector refinement|
法律状态:
2021-10-13| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201762442357P| true| 2017-01-04|2017-01-04|
US201762445152P| true| 2017-01-11|2017-01-11|
US15/861,515|US10931969B2|2017-01-04|2018-01-03|Motion vector reconstructions for bi-directional optical flow |
PCT/US2018/012360|WO2018129172A1|2017-01-04|2018-01-04|Motion vector reconstructions for bi-directional optical flow |
[返回顶部]